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WHAT IS CLAIMED IS : 

1. A methcfd for predicting at least one property of a candidate molecule, 
said method comprising: 

classifying a\set of reference molecules as either possessing or not 
possessing the at least one property; 

selecting a subset\pf said set of reference molecules, wherein all of the 
molecules in said subset possess thfe at least one property; 

selecting a pluralitWo^maijcer molecules from the subset, said plurality 
of marker molecules being/lesXn number than the number of molecules in said 
subset; and jj 

comparing structural characteristics of said candidate molecule with 
structural characteristics of at least one. of said marker molecules. 

2. The method of Claim 1, wherein^he at least one property is high protein 
binding. 

3. A method \ of selecting a set of marker molecules for structural 
comparisons in a model for molecular behavior prediction, said method comprising: 

classifying a\set of reference molecules as either possessing or not 
possessing the at leasnone property; 

selecting a subset of said set of reference molecules, wherein all of the 
molecules in said subset\possess the at least one property; 

selecting a plurality of marker molecules from the subset, said plurality 
of marker molecules beina less in number than the number of molecules in said 
subset. 

4. The method of Claim 3, wherein said subset comprises all of the 
molecules in said set that possess said\at least one property. 

5. The method of Claim \ wherein said selecting a plurality of marker 
molecules comprises: 

comparing all molecules inNsaid set with all other molecules in said set in 
accordance with a pre-defined numerical similarity metric; 
selecting a first molecule of sain subset; 
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sorting all other molecules of said set in descending order of numerical 
similarityxto said first molecule, thereby defining a similarity distance in terms 
of number Qf molecules between said first molecule and each other molecule of 
the set; 

defining, for each range in molecules of similarity distance away from 
said first molecule, a fractions-correctly-predicted metric as the number of 
molecules in saidXrange which are also members of said subset divided by the 
total number of molecules in said range; 

counting theViumber of molecules away from said first molecule at 
which the fractions-coprectly-predicted for said first molecule drops below a 
threshold value; 

repeating said selecting, sorting, defining, and counting steps for all other 
molecules of said subset; 

choosing, as said set oY marker molecules, those molecules of said subset 
having a fractions-correctly-predicted metric which exceeds said threshold value 
for a pre-selected minimum distance. 

6. The method of Claim 5, additionally comprising repeating said counting 
step for a plurality of different threshold values. 

7. The method of Claim 6, comprising repeating said choosing step at a 
plurality of different threshold values and minimum distances so as to select a plurality 
of preliminary sets of marker molecules. 

8. The method of Claim 7, comprising choosing a final set of marker 
molecules by making molecular behavior predictions tor all molecules in said set using 
each one of said preliminary sets of marker molecules, and choosing as said final set of 
marker molecules the preliminary set that most accurately^ predicts molecular behavior 
of molecules of said set. 

9. A method of predicting whether or hpt a molecule will be highly protein 
bound in serum, said method comprising: 

numerically defining the structural ^mijitnty of said molecule to a 
plurality of marker molecules, all of which are known to be highly protein bound 



in serum; 
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comparing said structural similarities to a corresponding plurality of 
numerical thresholds associated with each of said plurality of marker molecules. 
10. The method of Claim 9, wherein Wd molecule is categorized as highly 
protein bound if any one of the numerically define^ structural similarities exceeds any 
corresponding one of said numerical thresholds. 

\ 1 1 . The method of Claim 9, additionally comprising comparing the logP of 
said molecule to a logP threshold. 

12\ The method of Claim 11 5 wherein said mofecule is categorized as highly 
protein bound if (1) any one of the numerically defined structural similarities exceeds 
any corresponding one of said numerical thresholds, or (2) \f the logP of said molecule 

exceeds said logP threshold. 

\ u 

13. A system for predicting molecular activity, saia system comprising: 

one\>r more memories having stored thereon Vl^truStural information 
for a plurality, of marker molecules, all of which possess a selected biological or 
chemical activity, (2) a numerical similarity tluesh^ldlassigne^to each of said 
plurality of marker molecules, and (3) stractural information for at least one 
candidate molec^le;| 

a proclsJot configured to (1) structurally corrlpare said at least one 
candidate mole/ule\o ill of said plurality of marker molecules to produce a set 

i// \ \ ... 

of numerical similarity metrics, and (2) compare said numerical similarity 
metrics with said numerical similarity thresholds. ^ 

14. The system of Gkim 13, wherein said selected biplogical or chemical 
activity comprises high protein binding. 

15. A system for predicting propensity for protein bi 



ding in a candidate 
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molecule, said system comprising: 

one or more memories Storing structural information celated to a plurality 
of marker molecules, all of wl^ch are known to be highly srotein bound, and 
storing structural information related to said candidate molecule; 

means for numerically defining the structural similarity of said candidate 
molecule to said plurality of marker molecules; and 
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means for comparing said structural similarities to a corresponding 
plurality of numerical thresh^ds associated with each of said plurality of marker 
molecules. 

16. The system of Claim-^r^^tionally comprising means for estimating 
the logP of said candidate molecule 



ajad comparing said estimated logP to a threshold. 
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